Chen Yulin's BlogChen Yulin's Blog
HomeArchivesCategoriesTagsAbout
  目录
DREAM TO CONTROL= LEARNING BEHAVIORS  BY LATENT IMAGINATION
Posted 2025-11-22Updated 2026-03-08Review2 minutes read (About 233 words)   visits

DREAM TO CONTROL= LEARNING BEHAVIORS BY LATENT IMAGINATION

框架

论文使用**RSSM(Recurrent State Space Model)**:使用encoder来编码环境和动作生成latent state, 预测未来latent state,最后基于latent state预测奖励。

优势:

  • 网络可以在 latent 中快速 roll-out 数千条 imagined trajectories
  • 不用预测 pixel → 速度极快
  • 潜在空间的 Markov 性保证了规划时的可微分性

重参数化 Reparameterization Trick

Dreamer 最关键的地方:

动作必须是可微的随机变量,这样梯度才能从 value 反传到 actor。

如果我们直接写 $a \sim \mathcal{N}(\mu, \sigma)$, 那么采样是不可微的 → 梯度断掉 → Actor 无法学习。
重参数化技巧的做法:$a=\mu+\sigma\cdot\epsilon$, $\epsilon\sim\mathcal{N}(0,1)$
现在:

  • ε 是随机的
  • μ 和 σ 是可微的网络输出
    所以动作对 actor 参数有梯度, 这就是可微规划(differentiable planning)的基础。

DREAM TO CONTROL= LEARNING BEHAVIORS BY LATENT IMAGINATION

http://chen-yulin.github.io/2025/11/22/[OBS]World Model-DREAM TO CONTROL= LEARNING BEHAVIORS BY LATENT IMAGINATION/

Author

Chen Yulin

Posted on

2025-11-22

Updated on

2026-03-08

Licensed under

#Research-paperMLRLEmbodied-AIWorldModel
ChemGPT
奠定世界模型= Intelligence without representation

Comments

Chen Yulin

Chen Yulin

SJTU student

Manchester by the Sea

Posts

131

Categories

6

Tags

105

Follow

Catalogue

  • 框架
  • 重参数化 Reparameterization Trick

Archives

  • February 20268
  • November 20253
  • July 20252
  • May 20252
  • April 20259
  • March 202540
  • February 20259
  • January 202512
  • December 20246
  • November 20242
  • October 20244
  • September 20246
  • August 20241
  • July 20241
  • June 20241
  • May 20241
  • April 20244
  • March 20241
  • January 20241
  • December 20231
  • May 20231
  • August 20221
  • May 20226
  • April 20229

Recents

exist_label

2026-02-14

exist_label

Note

BAGEL-Unified-Multimodal-Pretraining

2026-02-06

BAGEL-Unified-Multimodal-Pretraining

Review

LingBot-VLA

2026-02-05

LingBot-VLA

Review

Mixture-of-Experts-Survey

2026-02-05

Mixture-of-Experts-Survey

Review

UniDiffuser

2026-02-03

UniDiffuser

Review

Tags

3D-Scene17
Atlas1
CADC1
CLIP11
CNN1
CV56
Chemistry1
Contrastive-Learning5
Csharp1
DINO3
DT1
Debate2
Diffusion2
DiffusionModel4
Discrete-Mathematics1
Embodied-AI18
Emoation1
Emotion9
FL1
FPN2
Foundation1
FoundationModel4
Functional programming1
Game1
Gated-NN3
Github1
HRI2
Haskell1
Hexo4
Hierarchical4
Html1
HumanoidRobot1
Image-Grounding2
Image-Text4
Image-generation2
Image2Text7
ImgGen3
ImitationLearning5
LLM15
LatentAction1
Latex1
Love2
ML8
MR/AR3
Message-Passing2
MoE2
Mod1
Multi-modal14
Multi-view1
MultiModal5
NLP6
NN7
Nodejs1
Object-Detection9
Open-Vocabulary11
OpenCV1
Panoptic1
Physical-Scene4
Plugin1
PoseEstimation3
Probability1
Promise1
Python1
Pytorch1
QML1
Quantum1
RL3
RNN3
ROS3
Reading3
Real2Sim2
Reconstruct13
Representation-Learning5
Research-paper97
RobotLearning13
Robotics29
SJTU-Lecture1
Scalability2
Scene-graph31
Scene-synthesis2
Segmentation7
Semantic14
Signals and Systems1
Sim2Real1
Snippets1
Subgraph1
Survey4
Task-Planning9
Tech Communication1
Transformer20
Translation-Embedding2
Travel1
Unified-Multimodal1
Unity1
VAE1
VLA2
VLM8
VLP5
VQ-VAE1
ViT5
Vim1
Visual-Relation23
WSL1
Web1
WorldModel2
Chen Yulin's BlogChen Yulin's Blog

© 2026 Chen Yulin  Powered by Hexo & Icarus

×